Duration normalization and hypothesis combination for improved spontaneous speech recognition
نویسندگان
چکیده
When phone segmentations are known a priori, normalizing the duration of each phone has been shown to be effective in overcoming weaknesses in duration modeling of Hidden Markov Models (HMMs). While we have observed potential relative reductions in word error rate (WER) of up to 34.6% with oracle segmentation information, it has been difficult to achieve significant improvement in WER with segmentation boundaries that are estimated blindly. In this paper, we present simple variants of our duration normalization algorithm, which make use of blindly-estimated segmentation boundaries to produce different recognition hypotheses for a given utterance. These hypotheses can then be combined for significant improvements in WER. With oracle segmentations, WER reductions of up to 38.5% are possible. With automaticallyderived segmentations, this approach has achieved a reduction of WER of 3.9% for the Broadcast News corpus, 6.2% for the spontaneous register of the MULT_REG corpus, and 7.7% for a spontaneous corpus of connected Spanish digits collected by Telefónica Investigación y Desarrollo.
منابع مشابه
Duration normalization for improved recognition of spontaneous and read speech via missing feature methods
Hidden Markov Models (HMMs) are known to model the duration of sound units poorly. In this paper we present a technique to normalize the duration of each phone to overcome this weakness, with the conjecture that speech with normalized phone durations may be better modeled and discriminated using standard HMM acoustic models. Duration normalization is accomplished by dropping frames if a phone i...
متن کاملImproving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملHierarchical duration modelling for speech recognition using the ANGIE framework
We describe a novel hierarchical duration model for speech recognition. The modelling scheme is based on the angie framework, a exible uni ed sublexical representation for speech applications. Our duration model captures contextual factors that in uence duration of sublexical units at multiple linguistic levels simultaneously, using both relative and absolute duration information. The modelling...
متن کاملWord Level Timing in Spontaneous Japanese Speech
This study provides evidence against the hypothesis that Japanese has word level mora-timing. Unlike previous studies which used careful speech, this paper evaluates timing in a corpus of spontaneous Japanese speech from 11 speakers. Correlations between word duration and number of moras in the word are shown to be much lower than in careful speech studies. Furthermore, if there were durational...
متن کاملAcoustic analysis and automatic recognition of spontaneous children²s speech
This paper presents analyses, and recognition experiments, on spontaneous American English speech collected from children aged from 8 to 13 years. These analyses focused on variations in phone duration and on the scattering of phones in the acoustic space and were aimed at achieving a better understanding of spectral and temporal changes occurring in spontaneous speech produced by children of v...
متن کامل